L2W decoding
- Given compressed string
& initial dictionary, uncompress the string
- Run compression algorithm
with the role of the string swapped with the indices
Try at
home
referees_ref_refs
BWT aka Block - sorting compression
- Reorder string so it is
easier to compress
- Repeated substrings becomes
runs of characters
- Sort all shifts
lexicographically ($ is smallest)
- Output
last column of character (last character of the shifts in sorted order.
Decoding
- Basic idea: follow indices
Limits on compression
- Why can't we compress until
1 bit?
- Entropy limit from
information theory
- There is a lower bound on
the # of bits needed to represent a string
- Need to distinguish all
possible strings
- For codes encoding 1
character per code word limit is
- Huffman codes obtain this if
we ignore floors & ceilings
There
are bounds for other cases
- Lossless compression works
by looking for patterns and writing it succinctly (using small # of bits)
- Need dictionary or some set
of rules for decoding
Drawing
KD tress
Cumulative
- Most in MCs (1st 2 pages)
Large
emphasis on compression